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Abstract 

We determine the rate region of the vector Gaussian one-helper source-coding problem under a covariance matrix distor- 
tion constraint. The rate region is achieved by a simple scheme that separates the lossy vector quantization from the lossless 
spatial compression. The converse is established by extending and combining three analysis techniques that have been em- 
ployed in the past to obtain partial results for the problem. 
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1 Introduction 

We study the vector Gaussian one-helper source-coding problerrQ depicted in Fig. [l] Here X and Y are two jointly vector 
Gaussian sources. Encoders 1 and 2 observe two i.i.d. strings distributed according to X and Y, respectively, and separately 
send messages to the decoder at rates Ri and R2 bits per observation, respectively, using noiseless channels. The decoder 
uses both messages to estimate X such that a given distortion constraint on the average error covariance matrix is satisfied. 
The goal is to determine the rate region of the problem, which is the set of all rate pairs {Ri, R2) that allow us to satisfy the 
distortion constraint for some design of the encoders and the decoder. 



Encoder 1 



Decoder 



Encoder 2 



•X 



Figure 1: Vector Gaussian one-helper source-coding problem. 

Oohama Q gave a complete characterization of the rate region for the case in which both sources are scalar. His achiev- 
ability proof is a Gaussian scheme that is described in more detail below. The converse argument uses the entropy-maximizing 
property of the Gaussian distribution and the entropy power inequality (EPI), and it bears a certain resemblance to Bergmans' 
earlier converse for the scalar Gaussian broadcast channel |2|. As such, one might hope that the channel enhancement tech- 
nique introduced by Weingarten et al. |3| to solve the MIMO Gaussian broadcast channel would be sufficient to solve the 
problem considered here. This turns out not to be the case, however. Among other contributions, Liu and Viswanath ID 
showed that channel enhancement yields an outer bound for the vector one-helper problem that is not tight in general. This 
was later improved slightly by the present authors to show that the Gaussian scheme achieves a portion of the boundary of the 
rate region |5|. Eiu and Viswanath's approach was later subsumed by Zhang |6|, who applied enhacement in a different way 
and called it source enhancement, but this also yielded an outer bound that is not always tight. 



*Both authors are with the School of Electrical and Computer Engineering. Cornell University, Ithaca, NY 14853 USA. (Email: mr534@cornell.edu, 
wagner@ece.cornell.edu.) 

' The material in this paper was presented in part at the 49th Annual AUerton Conference on Communications, Control, and Computing, University of 
Illinois, Urbana-Champaign, Sept. 2011. 
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Figure 2: A Gaussian achievable scheme. 



The case in which F is a scalar and X is a vector was recently solved by the authors Q. The proof did not use enhance- 
ment, but it did require a novel technique that we call distortion projection. Here we shall show that distortion projection, 
source enhancement, and Oohama's converse technique together are sufficient to solve the general problem in which both X 
and Y are vectors. In particular, we shall determine the rate region exactly and show that a vector extension of the Gaussian 
scheme used by Oohama is optimal. In this scheme, as depicted in Fig. [2] encoder 1 vector quantizes (VQ) its observations 
using a Gaussian test channel as in point-to-point rate-distortion theory. It then compresses the quantized values using Slepian- 
Wolf (SW) encoding |8 1. Encoder 2 just vector quantizes its observations using another Gaussian test channel. The decoder 
decodes the quantized values and estimates the observations of encoder 1 using a minimum mean-squared error (MMSE) 
estimator. 

The rest of the paper is organized as follows. Section |2] explains the notation used in the paper. In Section|3] we present 
the mathematical formulation of the problem, a description of the scheme, and the statement of our main result. Section |4] 
gives an outline of the converse argument. Since the proof of the converse is somewhat involved, it is divided into Sections [5] 
through [8] 



2 Notation 

We use uppercase to denote random variables and vectors. Boldface is used to distinguish vectors from scalars. Arbitrary 
realizations of random variables and vectors are denoted in lowercase. For a random vector X, X" denotes an i.i.d. vector of 
length n, denotes its /th component, and X"(i : j) denotes the ith through jth components. The superscript T denotes 

matrix transpose. The covariance matrix of X is denoted by Kx. The conditional covariance matrix of X given Y is denoted 
by Kx|Y ^nd is defined as 



Kx|Y — E 



(X-i?(X|Y))(X-i?(X|Y)f 



All vectors are column vectors and are m-dimensional, unless otherwise stated. We use to denote an m x m identity 
matrix. With a little abuse of notation, is used to denote both zero vectors and zero matrices of appropriate dimensions. We 
use Diag(di, d2, ■ ■ ■ , dp) to denote a diagonal matrix with diagonal entries di,d2, ■ ■ ■ , dp. The trace of a matrix A is denoted 
by Tr(A). For two real symmetric matrices A and B, A ^ B (A ;^ B) means that A — B is positive semidefinite (definite). 
Similarly, A =<; B (A ^ B) means that B — A is positive semidefinite (definite). All logarithms in this paper are to the base 
2. The determinant of a matrix K is denoted by |K|. The notation X ^ Y ^ Z means that X, Y, and Z form a Markov 
chain in this order. We use span{ci}'^j to denote the subspace spanned by {ci}\^^. 



3 Problem Formulation and Main Results 

Let X and Y be two generic zero-mean jointly Gaussian random vectors with covariance matrices Kx and Ky, respectively. 
Initially, we shall assume that X is m-dimensional and Y is fc-dimensional. Let {(X"(i), Y"(i))}"^j^ be a sequence of i.i.d. 
random vectors with the distribution at a single stage being the same as that of the generic pair (X, Y). As depicted in Fig. [T| 
encoder 1 observes X" and sends a message to the decoder using an encoding function 

fin) :M™"k^ {i,...,m}"'}. 
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Analogously, encoder 2 observes Y" and sends a message to the decoder using another encoding function 

/f^M'^" {l,...,Mf)}. 

The decoder uses both received messages to estimate X" using a decoding function 

: {l, . . . , m}"'} X {l, . . . , Af^")} ^ M"". 

Definition 1. A rate-distortion vector {Ri, R2, D) is achievable /or the vector Gaussian one-helper source-coding problem 
if there exist a block length n, encoding functions /|"^ and f2^\ and a decoding function g^"^^ such that 

R, > -logl^"^ for all i G {1,2}, and 



1 " r 



^ 1 

where 

X" ^ 5^") (/}"^ (X") , /^"^ (Y") 
Let TZD be the set of all achievable rate -distortion vectors and TZD be its closure. Define 

n{T>) = {(i?i,i?2) : (i?i,i?2,D) G TW} . 
We call 7?.(D) the rate region /or the vector Gaussian one-helper source-coding problem. 

Our goal is to characterize the rate region TZ(D). Note that the matrix distortion constraint is more general in the sense 
that it subsumes other natural distortion constraints such as a finite number of upper bounds on the mean square error of 
reproductions of linear functions of the source. In particular, it subsumes the case in which the distortion constraint is on the 
mean square error of reproductions of the components of X. 

Since we are interested in a quadratic distortion constraint, without loss of generality we can restrict the decoding function 
to be the MMSE estimate of X" based on the received messages. Therefore, X" can be written as 



X" = E 

We can assume without loss of generaUtjj^that 



(X"),/^"^ (Y") 



X = Y + N, 



where N is a zero-mean Gaussian random vector with the covariance matrix Kn and is independent of Y. The case in which 
Kx =^ D has a trivial solution. In this case, the rate region is the entire nonnegative quadrant. So, we assume that Kx =^ D 
does not hold in the rest of the paper. This means that there exists a direction z 7^ such that 

z'^Kxz > z^Dz. (1) 

For now, we assume that Kx, Ky, and D are positive definite. The general case of the problem will be addressed in Section 

m 

^Since X and Y are jointly Gaussian, we can write 

X = AY + N, 

wliere A is an m X fc matrix and N is an m-dimensional zero-mean Gaussian random vector tliat is independent of Y. Since tliere is no distortion constraint 
on Y, and AY is a sufficient statistic for X given Y (i.e., X Y +4- AY and X AY Y), we can relabel AY as Y and write 

X = Y + N. 
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3.1 Rate Region 

The rate region 7?,(D) is a closed convex set in the nonnegative quadrant. It is closed by definition and is convex because any 
convex combination of two points in the rate region is in the rate region as it can be achieved by time-sharing between the 
encoding and decoding strategies of the two points. Therefore, we can characterize it completely by its supporting hyperplanes, 
which can be expressed as the following optimization problem 

7^(D,/^)= inf /ii?i+i?2, 

(fli,i?^2)e-R,(D) 



where /i is a nonnegative real number. Let us define 

7^*(D,A^)^ 



V (Ppt-pt) ifO</i<l 
v{Pgi) ifM>l, 



where v (Ppt-pt) and v (Pgi) are the optimal values of the optimization problems [Ppt-pt] and (Pgi), respectively, which 
are defined as 

/D N A . M, |Kx| 

[Ppt-pt) = mm -log 



and 



Kx|u 2 |Kx|u| 

subject to Kx ^= Kx|u ^ and 
D ^ Kx|u, 



/p ^ A . |Ky|v + Kn| , 1^ |Ky 

(Pgi) = mm - log J— r-j p-^ + - log 



Ky|v:Kx|u,v 2 |Kx|U.v| 2 |Ky|v| 

subject to Ky ^ Ky|v ^ 0, 

Ky|v + Kn ^ Kx|u,v ^ 0, and 
D ^ Kx|u,v 

We use similar notation to denote other optimization problems and their optimal values throughout the paper The main result 
of this paper is the following theorem. 

Theorem 1. The minimum weighted sum rate for the vector Gaussian one-helper source coding problem is given by the 
solution to the above matrix optimization problem 

7^(D,Ai) = 7^*(D,Ai). 



3.2 A Gaussian Achievable Scheme 

In this subsection, we present a Gaussian achievable scheme (Fig. [2]i. The scheme is well-known and is sometimes referred 
to as the Berger-Tung scheme ll9l [T0ll . This scheme is known to be optimal for several problems in Gaussian multiterminal 
source-coding Uterature |[T] [TTl [121 [Ol [S] [15] [IS) • However, it is not optimal in some cases. For instance, a lattice-based 
scheme can outperform it if the goal is to reconstruct a hidden random vector that is jointly Gaussian with X and Y ifTTllTSl . 
and the discrete memoryless version of the scheme can be suboptimal if the sources have common components f\^. For the 
problem under consideration however, we shall prove that the Berger-Tung scheme is indeed optimal. We present an overview 
of the scheme here. The details for similar problem setups can be found in |[Tl [m . 
Let S be the set of zero-mean jointly Gaussian random vectors U and V such that 

(CI) U, X, Y, and V form a Markov chain U ^ X o Y o V, and 
(C2) Kx|u,v =^ D. 

Consider any (U, V) S S and a large block length n. Let = ^(X; U) + e, where e > 0. To construct the codebook for 
encoder 1, first generate 2"^i independent codewords U" randomly according to the marginal distribution of U, and then 
uniformly distribute them into 2"^^ bins. Encoder 2's codebook is constructed by generating 2"^^ independent codewords 
V" randomly according to the marginal distribution of V. 
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Given a source sequence X", encoder 1 looks for a codeword U" that is jointly typical with X", and sends the index b of 
the bin to which U" belongs. Encoder 2, upon observing Y", sends the index of the codeword V" that is jointly typical with 
Y". The decoder receives the two indices, then looks into the bin b for a codeword U" that is jointly typical with V". The 
decoder can recover U" and V" with high probability as long as 

Ri > /(X; U|V) and 
i?2 >/(Y;V). 

The decoder then computes the MMSE estimate of the source X" given the messages U" and V", and (C2) above guarantees 
that this estimate will satisfy the covariance matrix distortion constraint. Let 

TZgCD) ={(i?i,i?2) : there exists (U,V) e S such that 
Ri > /(X;U|V) and 
i?2>/(Y;V)}. 

Furthermore, define 

T^-cfD./i)^ min iiRi + R2. 
The following lemma gives the weighted sum-rate achieved by this scheme. 
Lemma 1. The Gaussian achievable scheme achieves 7?.g(D, fi) and 

Proof. It follows immediately that the Gaussian achievable scheme achieves 7?.g(D, /i). The equality in Lemma[r|is proved 
in Appendix A. □ 

Lemma [T] implies that 

7^(D,^i) <7^* (D,^). 

We prove the reverse inequality (converse) next. Since the proof is rather long, we divide it into sections. The next section 
gives a nonrigorous overview of the argument. In the following section, we study the optimization problem (Pgi) in the 
definition of 7?.*(D, /i) and estabhsh several properties that its optimal solution satisfies. We use these properties in Section|6] 
to prove the main result needed for the converse. We finally complete the proof of Theorem[T|in Section]?] 



4 Overview of the Converse Argument 

The starting point of our proof is Oohama's converse for the scalar case, which proceeds as follows. Let and /j"'' be 
encoding functions and g'^"^ be a decoding function that achieve the rate-distortion vector (i?i, i?2, D). Let Ci = /|"^ i^") 
and C2 = f2^\Y"). By standard steps, we have 

ni?2 > log 1/3'"^ 
> H{C2) 

Likewise, we have 

nRi > log m}"^ 
>H{Ci\C2) 

^ I {x^^ -,01102) 

==/(X";Ci,C2)-/(X";C2). 
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It follows that 

ni?i > inf I{X"; Ci, C2) - /(X"; C2) 

Cl ,C2 

n 

subjectto ^S[(X"(i)-£:[X"(i)|Ci,C2])^] <nD, (2) 

i=l 

I{Y";C2) <nR2, and 

X" o y" o C2. 

Now this infimum can be lower bounded by separately optimizing each term 

nRi> inf /(X";Ci,C2) - sup /(X";C2) 

Cl,C2 



subjectto ^E[{X'\i) - E[X'\i)\Ci,C2]f] <nD subjectto /(F"; C2) < ni?2 and (3) 

i=l 

X" o y" o C2. 

The first optimization problem, 

inf /(X";Ci,C2) 

■n. 

subject to [(^"(«) - E[X"{i)\Ci,C2]f] < nD, 

i=l 

which we call the distortion problem, can be solved using the entropy-maximizing property of the Gaussian distribution and 
the concavity of the logarithm. The second problem, 

sup /(X";C2) 

C2 

subjectto /(y";C2) < ni?2 and (4) 

X" o y" o C2, 

which we call the helper problem, can be solved via the conditional version of the entropy power inequality |2 |. Substituting 
these solutions into ([3]) yields exactly the i?i achieved by the scheme from the previous section for the given R2 and D. This 
completes Oohama's converse proof for the scalar case. 

The key to Oohama's proof is that separately minimizing the two terms in (|2]i does not decrease the objective. More 
precisely, for any pair (C^ , ) that achieves the infimum in (j2|i we have 

I{X^-CIC*2)^ inf /(X";Ci,C2) 

11 

subjectto ^E[{X'''{i)-E[X'''{i)\Ci,C2]f]<nD, (5) 

i=l 

and 

I{X^-C*2)^ sup /(X";C2) 

subjectto /(y";C2) < ni?2 and (6) 

X" o y" ^ C2, 

Whenever (j5]l occurs, we shall say that the distortion problem incurs no loss. Whenever (|6| occurs, we shall say that the 
helper problem incurs no loss. 

It is not difficult to verify that this proof also works when X is a scalar and Y is a vector In particular, both the distortion 
and helper problems incur no loss in this case. When both X and Y are vectors, the proof breaks down in three places: 

1. The distortion problem incurs a loss in general. For instance, if D =<; Kx, then the distortion problem is solved by 
choosing Ci and C2 so that 



r 

^ E (x"(j) - i?[x"(*)|Ci, C2]) (x"(z) - £;[x"(z)|Ci, C2: 



nD. 
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That is, the constraint is met with equahty. For the original problem in on the other hand, even if D =^ Kx we can 
only guarantee that 



r 



T 

2 



4=1 

and equality does not hold in general. The lack of equality is easiest to see when Ky is poorly conditioned. If Ky 
has essentially one nonzero eigenvalue, then the helper will allocate all of its rate in the direction of the associated 
eigenvector If i?2 is large, this could result in "overshooting" the distortion constraint in that direction. 

2. The helper problem also incurs a loss in general. One way of seeing this is to note that if the goal is only to maximize 
the mutual information in Q, then one might choose C2 to favor a direction along which the distortion constraint D is 
not active over one for which it is. This would necessarily deviate from the optimizer C2 of the original problem. 

3. The vector EPI does not solve the helper problem in general. 

To address the first issue, observe that the distortion problem incurs no loss if the optimizers and for the original 
problem happen to meet the distortion constraint with equality, i.e., it holds that 



n r 

Y^E (x"(i)-£;[X"(z)|Cr,C2l)(x"(z)-£;[X"(i)|Ci*,C*: 



i=l 



nD. 



In prior work Q, we showed that it is possible to reduce the general case to this one by projecting the source and the distortion 
constraint in the directions in which the distortion constraint is met with equality for the candidate optimal scheme. We call 
this process distortion projection. This addresses the first issue. One can verify that if X is a vector and y is a scalar, then the 
second and third issues do not arise, and hence distortion projection together with Oohama's converse arguments is sufficient 
to solve the problem |7 1. 

Liu and Viswanath |4l showed that the channel enhancement technique of Weingarten et al. |3| is sufficient to solve the 
helper problem in the vector case, thereby addressing the third issue. Their solution, however, is not sufficient to handle the 
second issue. Recently, Zhang ||6l introduced a variation on the enhancement idea called source enhancement that subsumes 
Liu and Viswanath's approach. Source enhancement effectively replaces the original problem with a relaxation for which the 
helper problem incurs no loss and the vector EPI solves the helper problem, although Zhang does not describe it in this way. 
This addresses the second and third issues. Thus it appears that distortion projection, source enhancement, and Oohama's 
converse technique together should be sufficient to solve the case in which both X and Y are vectors. We shall show that this 
is indeed true. Source enhancement and Oohama's converse technique are lifted directly from |[Tl|6l. The distortion projection, 
on the other hand, requires an extension beyond what was needed in the scalar helper case 0. This extension requires us to 
first establish several properties of the optimal Gaussian solution to the problem, to which we turn next. 



5 Properties of the Optimal Gaussian Solution 



In this section, we study the optimization problem {Pgi) defined in Section 3.1 Note first that the constraints 



Ky|v ^= and 
Kx|u,v ^ 

are never active because otherwise the objective value is infinite. We therefore ignore these constraints in the study of the 
problem. Now, instead of studying {Pgi) directly as it is, we study an equivalent formulation. This formulation is implicit 
in ||6l. Note that if Ky|v and Kx|u,v are feasible for {Pgi)^ then there exist two positive semidefinite matrices Bi and B2 
such that 

Ky|v = Ky — B2, 
Kx|u,v = Ky|v + Kn Bi 

= Ky B2 + Kn Bi 
= Kx Bi B2, and 
KX-B1-B2 =^D. 
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Therefore, (Pgi) is equivalent to the following problem 



,p ^ A . A', IKX-B2I , 1, |K 

{Pg2) = mm log TIT S ^ + n 



Bi,B. 2 °|Kx-Bi-B2| 2 °|Ky-B2| 
subject to Bi ^ for all i E {1,2}, and 
D^Kx-Bi-B2. 

We next establish several properties that the optimal solution to {Pg2) satisfies. 

Since (^02) has continuous objective and a compact feasible set, there exists an optimal solution (6^,62) to it. The 
Lagrangian of the problem is fTO^. Sec. 5.9.1] 

I |K!f-\''-k| + 1 1°^- - + ^^^^ - - ^)^) ' 

where Mi, M2, and A are positive semidefinite Lagrange multiplier matrices corresponding to the constraints Bi ^ 0, B2 ^ 
0, and D ^ Kx — Bi — B2, respectively. The KKT conditions for this problem are ||20l Sec. 5.9.2] 

|(Kx-B*-B*)-i-A*-Mt =0, (7) 

|{Kx - BI - B;)-' - f (Kx - B;)-i + ^(Ky - B^y' A* M; = 0, (8) 

B*M*=0, for all i e {1,2} (9) 

(Kx - B^ - B; - D)A* = 0, and (10) 

M^,m;, A* :>= 0, (11) 

where M^, M2, and A* are optimal Lagrange multiplier matrices. Conditions (jTjl and ^ respectively are obtained by setting 
gradients of the objective with respect to Bi and B2 to zero. Conditions (j9]l through ( 10 1 are slackness conditions on the 
Lagrange multiplier matrices. We next estabUsh that these KKT conditions must hold at (B^, Bj). 

Lemma 2. There exist matrices M| , M2 , and A* that satisfy the KKT conditions Q - ( 77 I. 



Proof. See Appendix B. □ 
Let us define 



It follows from conditions (|7]l and (|8]l that 



A* ^ A* - I [(Kx - BI - B*)-i - (Kx - B*)^i] 



A* = |(Kx - B;)-i - mi - i(KY - B;)-i - M;. (12) 

We have the following lemma. 

Lemma 3. A* is a nonzero positive semidefinite matrix. 

Proof. See Appendix C. □ 

If A* happens to be positive definite, then distortion projection turns out to be unnecessary. To handle the case in which 
A* is singular, we shall use distortion projection. Since A* , M^ , and Mj are positive semidefinite, we can write their spectral 
decompositions as 

r 

A*=^A,s,sf, (13) 

1=1 
p 

MI=^a,a,af, and (14) 

i=l 

q 

M;=^/3,b,bf, (15) 

1=1 

where 



8 



(i) < r < TO, 

(ii) < p,q < m, 

(iii) Aj > 0, for alH e {1, . . . , r}, 

(iv) ai > 0, for all i e {1, . . . ,p} 

(v) /3i > 0, for alH G {1, . . . , q}, and 

(vi) {si}[^j^, {ai}^^-^, and {b^j^^j^ are sets of orthonormal vectors. 



Note that we allow p and q to be zero because M^; and can be zero. Since ( 12 1 implies 

A* + M* = ^(Kx - B*)-^ ^ and 



-M* = -(Ky-B;)-V0, 



we must have 



r + p > m and 

r + q > m. 

This means that if r + p = m, then Si , S2 , . . . , , ai , a2 , . . . , must be linearly independent. Similarly, if r + g = m, then 
Si, S2, . . . , Sr, bi, b2, . . . , must be linearly independent. 
Define the matrix 

It now follows from the definition of A* that 



AiSi, V ^2S2, • • • , V ^rS 



A* ^ A* = SS^ 

because 

(Kx-Bt-B;)-i ^ (Kx-B;)- 



This and (10 1 imply that 

(Kx - B^ - -D)S = 0. (16) 
Let C be an TO X TO positive definite matrix and {Ci, C2, . . . , Ci} be a set of to x to positive definite matrices. 
Definition 2. A non-zero m x p matrix E is C-orthogonal ifE^CE is a diagonal matrix. 

Definition 3. A non-zero m x p matrix E is {Ci, C2, . . . , Ct}-orthogonal if it is C^-orthogonal /or all i e {1, 2, . . . , t}. 

Definition 4. A non-zero m x p matrix E and a non-zero m x q matrix F are cross C-orthogonal if I^^CF = 0. 

Definition 5. A non-zero m x p matrix E and a non-zero mx q matrix F are cross {Ci, C2, . . . , Ct}-orthogonal if they are 
cross Ci-orthogonal /or alii G {1,2,..., t}. 

Definition 6. A non-zero vector w is in span{ci}'^j^ if there exist real numbers such that 

I 

i=l 

We denote this as 

w e span{cj'^;^. 

We have the following theorem about the optimal solution to the optimization problem {Pg2)- 
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Theorem 2. There exist two matrices 

T — [ti, t2, . . . , t„_r] 

and 

W ^ [Wl, W2, . . . , W,,„_r] 

such that [S, T] and [S, W] are invertible and if r < m then 

(a) ti,t2, . . . ,t„,_r e span{ai}f^p 

(b) T w {(Kx - B^), (Kx - - B^)} -orthogonal with 

T^(Kx - B;)T = T^(Kx - B* - B;)T, 

(c) S and T are cross {D, (Kx - BJ), (Kx - B^ - B^)} -orthogonal, 

(d) wi,W2,...,w,„_r e span{bj^^p 

fej W is {Ky, (Ky - B2)} -orthogonal with 

W'^KyW = W'^(Ky - B;)W, and 

(f) S andW are cross {Ky, (Ky — Bj)} -orthogonal. 
Proof. It suffices to consider r < m case. Since A* = SS^ is rank deficient in this case, there exists zi 7^ such that 

S'^zi = 0. 

Let us define 

ti ^ (Kx-B;)-1zi. 

Therefore 

S^(Kx-B;)ti =0. 

We have from ([T2]i, ^3}, and ([T4| that 

p 

|(Kx-B;)-i = A*+Mt =SS^ + ;^a,a,af. 

1=1 

On post-multiplying this by (Kx — B2)ti, we obtain 

p 

|ti - SS^(Kx - B;)ti + J2 a.a,af (Kx - B*)ti 

i=l 

P 

= ^a,ai(af(Kx-B;)ti). 

i=l 

This proves that 

ti e span{ajf^^. 

We next show that 

ti <^ span{sj[^i. 

Suppose otherwise that 

ti e span{sj[^i. 

Then there exist real numbers {ci}l^i such that 

r 

tl = ^ C,;Sj. 

i=l 
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Since S'^(Kx - B^)ti = 0, we have 

sf(Kx-B*)ti =0 for all i e {1, 2, . . . , r}. 
On multiplying this by Cj and then summing over alH in {1, 2, . . . , r}, we obtain 

tf (Kx - B^)ti = 0, 
which is a contradiction because Kx — B2 is positive definite. We therefore have that 

ti ^ span{si}-=i. 

We have shown so far that there exists ti e span{eij}^^j such that the rank of [S, ti] is r + 1 and 

S^(Kx - B*)ti = 0. 

Let us now assume that there exists 

Tj = [ti,t2, . . . ,tj], 

where 

ti,t2, . . . ,tj e span{ai}f^i 
and 1 < j < m — r such that the rank of [S, Tj] is r + j, 

S^(Kx - B^)Tj = 0, 

and 

t^Kx - B*)ti = 
for all A; ^ Z in {1, 2, . . . , j}. Then there exists Zj+i 7^ such that 

[S,T,/z,+i=0. 



Let us define 

We therefore have that 

It can be shown as before that 

and 

Hence, the rank of [S, Tj+i] , where 

is r + j + 1, 

and 



tj + l - (Kx - B2) ^Zj + i. 

[S,T,f(Kx-B;)t,+i = 0. 
tj+i e span{ai}f^i 

tj+i i span {{si}[=i, {tk\i^^ ■ 

Tj+i = [Tj,tj+i], 
S^(Kx - B;)Tj+i = 0, 



t^Kx - B*)t, = 0, 

for all fc Z in {1, 2, . . . , J ■ + 1}. It now follows from the mathematical induction that there exist 

ti,t2,...,tm_r e span{ai}^^i 

such that if we define 

T = [ti,t2, • . • jtm-r]) 
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then [S, T] is invertible. 



S'^(Kx - B;)T 0, and 

t^(Kx-b;)t = g, 



where 



G ^ Diag{(tf (Kx - B;)ti), (t^(Kx - B*)t2), . . . , (C_,(Kx - B;)t„_,)}- 



Since B^ T = from m and (Kx - B^ - B^)S = DS from ( 16 1, we immediately have that 



S'^(Kx - B;)T = S^(Kx - B^ - B;)T = S^DT = 0, and 
T'^(Kx - B;)T = T^(Kx - B^ - B^)T = G. 

This completes the proof of parts (a) through (c) of the theorem. 



For parts (d) through (f), we have from ( 12 1, ( 13 i, and ( 15 1 that 

1 



Similar to the previous case, we can find 



such that if we define 



then [S, W] is invertible, 



i=l 



Wi,W2, . . . ,w™_r G span{bjf^;^ 

W = [Wi, W2, . . .,Wjn^r], 

S^(Ky-B;)W = 0, and 
W^(Ky - B*)W = H, 



where 



H ^ Diag{(wf (Ky - B;)wi), (w2^(Ky - B*)w2), . . . , (wf„_,(KY - B*)w„,_0}- 
bmce B^W = from we conclude 

S^KyW = S'^(Ky - B;)W = 0, and 
W^KyW = W^(Ky B^)W = H. 

This completes the proof of parts (d) through (f) of the theorem. 

We have the following corollary of Theorem |2] 
Corollary 1. Ifr<m = r+p, then we can set 

for all i in {1,2, ... , p}. Similarly, if r < m = r + q, then we can set 



□ 



for all i in {1, 2, . . . ,q}. 

Proof. Let r < m ^ r + p and let us set 



for all z in {1, 2, ... ,p} in the definition of T. We have from ( 12 1, ( 13 1, and (14 1 that 



(Kx - B;)-i = J2 ^^^^^f + j2 



(17) 
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Now, on post-multiplying ( 17 1 by (Kx — 62)81, we obtain 



|si = (sf (Kx - B*)si) + a,a, (af (Kx - B;)si) , 



which can be re-written as 



r p 

Si (I - Ai (sf (Kx - B;)si)) - J2 >^^^^ (sfC^x " B*)si) = ^ a,a,; (af (Kx - B*)si) 



(18) 



Since [S, T] is invertible from ( 17 1, its columns are linearly independent. Hence, the coefficients of all vectors in ( 18 1 must 
be zero. Therefore, 



Aisf(Kx-B*)si 



2' 



sf (Kx - B;)si =0, e {2, . . . , r}, and 
af(Kx-B*)si =0, Vze {l,...,p}. 



Likewise, on post-multiplying ( 17i by (Kx — 63)82, . . . , (Kx — 62)3,., (Kx — B2)ai, . . . , (Kx — B2)ap and then equating 
all coefficients to zero, we obtain similar equations. In summary. 



A.8f(Kx 


- B;)8, 


2' 


V^e {1,.. 






a^af (Kx 


-B*)a, 


2' 


{1,.. 


■ ,p}, 




sf(Kx 
af(Kx 
sf(Kx 


- B;)8, 

- B;)a, 

- B;)a, 





Vze{i,... 


■■■r},i^ i, 
■ ■ ,p},i 7^ h 
,r},Vje{l, 


and 



Hence, 



[S,Tr(Kx-B;)[S,T] = |l„ 



(19) 



Parts (a) through (c) of Theorem [2] follow immediately from ( 10 1, and ( 19 1 because M| = TT"'^ in this case. 
The proof for the case when r < m = 



q is exactly similar. It starts with the following from ( 12 1, ( 13 1, and ( 15 1 



□ 



In summary, the key properties of the optimal Gaussian solution are as follows. If A* (and hence S) is not invertible, 
then there exist two matrices T and W such that their columns respectively are in span{aj}^^j^ and span{bi}^^j^, [S, T] and 
[S. W] are invertible, S and T are cross (Kx — B2) -orthogonal, and S and W are cross (Ky — B2)-orthogonal. We shall 
exploit these properties in the next section to prove the optimality of an optimization problem, which is central to prove our 
main result. 



6 Converse Ingredients 

Let us define an optimization problem as 

(P) ^ min Ai/(X;U|V)+/(Y;V) 

subject to Kx|u.v D and 
X o Y V, 

where X, Y, D, and fi are defined as before. We refer to this problem as the main optimization problem and denote it by (P). 
We have the following theorem. 
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Theorem 3. A Gaussian (U, V) is an optimal solution of the main optimization problem (P). 

We prove this theorem in the remainder of the section. The proof for /i in [0, 1] is easy. In this case, the objective of (P) 
can be lower bounded as 

/z/(X; U| V) + /(Y; V) = /i/(X; U, V) - /i/(X; V) + /(Y; V) 

= /./(X; U) + /i/(X; V|U) + fi[I{Y; V) - /(X; V)] + (1 - fi)I{Y; V) 
>///(X;U) (20) 
= fih{X) ~ ^ih{X\U) 

>^ogJH, (21) 



2 "IKxiul 



where 

( pO| follows because of the facts that 
and 

and we have 



J(Y;V) > 
J(X;V|U) >0, 



/(Y;V)-/(X;V) >0 

because of the data processing inequality 11211 Theorem 2.8.1] and the Markov chain X <-> Y V, and 

pT| ) follows because the Gaussian distribution maximizes the differential entropy for a given covariance matrix ||2T1 Theorem 
8.6.5], i.e., 

MX|U)< ilog((27re)"|Kx|u|). 



Inequalities (20 1 and (21 1 become equalities if we choose a Gaussian (U,V) such that V is independent of (X.Y, U). 
Because of the distortion constraint in (P), the conditional covariance of X given (U, V) should satisfy 

=^ Kx|u,v = Kx|u =^ D. 
Since conditioning reduces covariance in a positive semidefinite sense, we also have 

Kx|u ^ Kx- 

Hence, if fj, is in [0, 1], then a Gaussian (U, V) is an optimal solution of the main optimization problem (P) and the optimal 
value is 

V (P) = mill ^ log 

Kx|u 2 |Kx|u| 

subject to Kx ^= Kx|u ^= and 
D ^ Kx|u 

= v{Ppt-pt). (22) 

We therefore assume that > 1 in the rest of the section. 

Let us first restrict the solution space of (P) to Gaussian distributions. This results in an optimization problem {Pqi), or 
equivalently (Pg2), defined in Sectionjs] For convenience, we shall work with the (Pg2) formulation. First note that since 
restricting the solution space to Gaussian distributions can only increase the optimal value of the main optimization problem 
(P), we immediately have 

viPGl)=v{PG2)>viP). (23) 

So, it suffices to prove the reverse inequality 

V (Pg2) <v{P). 
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Let (BJ, B2) be an optimal solution to (Pg2)- As discussed in Section|5] (B^, B2) gives three matrices S, T, and W that 
satisfy the properties in Theorem|2] Using these properties, the optimal value of {Pg2) can be expressed as 



V {Pg2) = log 
log 



2 



— lop 



1, |Ky| 

Kx-B*-B*| 2 °^|Ky-B* 

|[S,T]^(Kx-B^)[S,T]| 
[S,T]i^(Kx-B^-B*)[S,t: 

(Kx - 




1 



log 



B^)S 



|[S,W]^Ky[S,W]| 
2-"°|[S,W]^(Ky-B*)[S,W]| 





(24) 



(Kx - BJ) T 



S^(Kx-B^-B^)S 

T^fK 



X 









/ S^KyS 








V 


W^KyW ) 




( S'^(Ky-B*)S 







V 





W^(Ky-B5)W j 



loe 



^log 
2 ^ 



log 



|S^ (Kx-B*)S||T^ (Kx-B*)T| 
|S^ (Kx - B* - B*) S| IT^ (Kx - B^ - B*) T| 
IS^KySI IW^KyWI 



(25) 



- log 
2 ^ 



|S^(Kx-B^)S| 1 

-L + - log 



(Ky - B*) S| |W^ (Ky - B*) W| 
yS| 

Inff — =r^ 

2 



IS^DS 



|S^ (Ky-B*)S|' 



(26) 



where 



p4l ) follows because [S, T] and [S, W] are inveitible, 

( p5] ) follows because S and T are cross {(Kx — B2), (Kx — B^; — Bj)} -orthogonal, and S and W are cross {Ky, (Ky 
B2 )} -orthogonal, and 

( |26| follows from ([16]) and the facts that 

(Kx - B*) T = (Kx - B^ - B*) T and 
W^KyW = (Ky - B;) W. 



6.1 Distortion Projection 

The special structure to the optimal Gaussian solution of {Pg2) suggests a way to lower bound (P) by projecting the sources 
X and Y on S and imposing the distortion constraint on the subspace spanned by the columns of S. Note that the distortion 
constraint is tight on this subspace for the optimal Gaussian solution. We refer to this method of lower bounding (P) as 
distortion projection. Let us define 

X ^ s'^x, 

Y = S'^Y, 
D = S^DS, 
BJ = S'^BJS, 

b; = s'^b;s, 

Ml = (S"^ (Kx - B;) S)^^ (Kx - B^) (Kx - B^) S (S"^ (Kx - B^) S)~^ , and 

= (S"^ (Ky - B;) S)"^ (Ky - B^) (Ky - B^) S (S"^ (Ky - BJ) S)"^ . 
Since S has full column rank, we immediately have that 

Kx,Ky,D^O, 

Bi,B;:>=0, and 
Ml,M*2 ^ 0. 
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The projected optimization problem (P) is now defined as 

(P) 4 min Ai/(X;U|V)+/(Y;V) 

subject to K^iu =^ D and 

We next show that the main optimization problem (P) is lower bounded by the projected optimization problem (P). Since 
[S, T] and [S, W] are invertible and mutual information is nonnegative, we have 

Ai/(X; U| V) + /(Y; V) = ^/ (S'^X, T'^X; U| V) + / (S^Y, W^Y; V) 

= ill (S'^X; U|V) + ^/ (T^X; U|V, S'^X) + / (S'^Y; V) + / (W^Y; V|S^Y) 
>/i/(X;U|V)+/(Y;V). (27) 



Consider any (U, V) feasible for (P). Then 



Now ([28]l implies 



and ( 29 1 yields 



D^Kx|u,v and (28) 
X Y V (29) 



D = S^DS ^ S^Kx|u,vS = Kx|u^v> (30) 



= /(X;V|Y) 
= / (S'^X; V|Y) + / (T^X; V| Y, S'^X) 

>/(S'^X;V|Y) (31) 
= /(S'^X;V|S^Y,W^Y) (32) 
= h (S^XjS'^Y, W^Y) - h (S'^X|V, S^Y, W^Y) 

> /i(S'^X|S^Y) -/i(S^X|V,S^Y) (33) 
= /(S'^X;V|S^Y) 
= /(X;V|Y) 

> 0, (34) 



where 

pT| ) and ( 34 1 follows because mutual information is nonnegative, 
(|32|) follows because [S, W] is invertible, and 



33| ) follows because conditioning reduces entropy and we have from Theorem |2] that W^Y is independent of S^Y, which 
implies that W-'^Y is also independent of S"^X because X = Y + N by assumption. 



Now p4) i is equivalent to 

X o Y o V, 



which together with (30i implies that (U, V) is feasible for (P). Hence, the feasible set of (P) is contained in that of (P). 
Moreover, (27i above implies that the objective of (P) is no less than that of (P). We therefore have that the projected 
optimization problem (P) lower bounds the main optimization problem (P), i.e., 

v{P)>v{P). (35) 
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By restricting the solution space of (P) to Gaussian distributions, we obtain its Gaussian version 

M,„„ IKX-B2I , 1 _ |KyI 



{Pg2) = min - log ; ^ ^ + - log ■ 



Bi,B2 2 °|Kx-Bi-B2| 2 °|Ky-B2| 
subject to B,: )>= for all i e {1,2}, and 
D^Kx Bi B2. 

It is easy to verify that the projected optimal Gaussian solution (B^, B2) is feasible for (^02) and it meets the projected 



distortion constraint D with equality from ( 16 1. We next show that (B^; , B2) is in fact optimal for (P) 



Remark 1: If r = m, then there is no need for distortion projection because S is invertible, and hence so is A*. 

6.2 Source Enhancement 

In this subsection, we use the KKT conditions (j?]) through (111 satisfied by (B^ , B2 ) to derive conditions that must be satisfied 
by (B^ , B2). These conditions are then used to define the enhanced optimization problem, which lower bounds (P). We show 
that the optimal solution to the enhanced optimization problem is Gaussian, in particular (B^, B2) is optimal for the problem. 
This will in turn prove that (B^,B2) is optimal for (P). This approach of lower bounding is referred to as the source 
enhancement f6l and is similar to the channel enhancement idea of Weingarten et al. ijS). 
We start with the following key lemma. 

Lemma 4. For K-^ , Ky , D , B*, and M* , wliere i — 1,2, defined as above, the following hold 

~Ml^\ (Ky - B;) - M;, (36) 

B-M* = /or a« ie{l,2}, and (37) 
Kx - B* - B; = D. (38) 

Proof. See Appendix D. □ 

Let Kx and be two real symmetric matrices satisfying 

~ ' and (39) 

(40) 

We now have the following lemma, which is similar to [3 , Lemmas 11, 12]. 

Lemma 5. Por K-x, Ky, K-x, Ky, B*, M*, i = 1,2, defined as above, and j-i > 1, the following hold 

Kx - B* = (41) 

Ky - B; = ^Ir, (42) 

Kx ^ Ky ^ Ky ^ 0, (43) 
Kx ^ Kx ^ 0, (44) 

l^^l - l^^l and (45) 



2 


(Kx 




^ - = 


M 
2 


(Kx 


b; 


1 








1 






2 


(Ky 


~B*) 


- = 


2 




b; 



|Ky-B*| |Ky-b*, 
|Kx-B^| _ |Kx-B^| 



|Kx-B*-B*| |Kx-B*-B*| 



(46) 



Proof. See Appendix E. □ 
Let X and Y be two zero-mean Gaussian random vectors with covariance matrices Kx and Ky, respectively. Since 



Kx >~ Ky from (43 1, we can write 

X = Y + N, 

where N is a zero-mean Gaussian random vector with the covariance matrix 

Kn = Kx-Ky = ^I. 
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and is independent of Y. Similarly, we can use (43 i and (44 1 to relate X and Y with X and Y, respectively, and write 



X = X + Ni and 
Y = Y + N2, 

where Ni and N2 are two zero-mean Gaussian random vectors with covariance matrices 

= K-jj^ — Kj^ and 
= Ky — 



min /i/(X;U|V)+/(Y;V) 



respectively, and they are independent of X and Y. Using ( 38 1, we define 

D = D + Kni =Kx-B*-B;. (47) 
The enhanced optimization problem (P) is now defined as 

(P) ^ 

subject to K-j^|y ^ =^ D and 
X o Y o V. 

We next show that (P) lower bounds (P). Consider any (U, V) feasible for (P). Without loss of optimality, we can assume 
that the joint distribution between X, Y, U, and V is 

~ A 

P — Px,yPu|x,vPv|y- 
Now, p induces two conditional distributions as follows 



P-v\Y — ^^Pv|yPy|y 



where 



Then 



^'uix.v ^ |^Pu|x,vPx|x,V' 



_ Px.xPvix 
^'xix^v - r 

Jx^x,x^v|x 



P ~ Px,yPu|x,vPv|y 

is a joint distribution between X, Y, U. and V. It is clear that p satisfies the Markov condition 

X o Y o V. 



Moreover, (47 1 and the distortion constraint in the definition of (P) yield 



^X|U,V = ^X|U,V + =^ D + Kni = D. 



(48) 



(49) 



We next use the chain rule of mutual information to obtain 



and 



/(X, X; U| V) = /(X; U| V) + /(X; U| V, X) 
= /(X;U|V) + /(X;U|V,X) 
= /(X;U|V) 

J(Y,Y;V) = J(Y;V) + /(Y;V|Y) 
= /(Y;V)+/(Y;V|Y) 
= /(Y;V). 
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Since mutual mformation is nonnegative, these imply that 

/(X;U|V)>/(X;U|V) 

and 

/(Y;V)>/(Y;V) 



(50) 



(51) 



Now (48 I and (49 1 together imply that the distribution p, and hence (U, V), is feasible for [P). Therefore, the feasible set of 



(P) is contained in that of [P) . Moreover, ( 50 1 and ( 5 1 1 assert that the objective value of (P) is no more than that of (P) . We 



therefore conclude that the enhanced optimization problem (P) lower bounds the projected optimization problem (P), i.e.. 



v{P) > v{P). 



(52) 



Remark 2: If r < m = r + p, then there is no need to enhance the source X and the distortion D because = TT^ 
from Corollary [l] and hence MJ = 0. Similarly, if r < m = r + q, then there is no need to enhance the source Y because 
= WW"^ from Corollary [l] again, and hence M2 = 0. Finally, ifr<m — r+p — r + q, then there is no need for 

source enhancement. 



6.3 Oohama's Approach 

We now apply Oohama's approach [l] to prove that (B^, BJ) is optimal for (P). The objective of (P) can be decomposed as 

m/(X; U| V) + /(Y; V) = ^lI{±■ U, V) - [/i/(X; V) - /(Y; V)] . (53) 
We next define two subproblems that are used to lower bound the enhanced optimization problem (P). The first subproblem 



(Pi) minimizes the first mutual information in the right-hand-side of (53 1 subject to the distortion constraint in (P) and the 



second subproblem (P2) maximizes the expression within the parenthesis in the right-hand-side of (53 1 subject to the Markov 
condition in (P). In other words, (Pi) is defined as 



(^1) 



and (P2) is defined as 



(A) ^ 



mill /i/(X:U,V) 
u,v ^ ' 

subject to K-j^.y =^ D, 



m^x m/(X;V) -/(Y;V) 
subject to X o Y o V. 



It is clear from the decomposition in (53 1 and from the definitions of (P), (Pi), and (P2) that (Pi) and (P2) lower bound (P), 



I.e., 

v{P)>v{Pi)-v{P2). 

We now give two lemmas about the optimal solutions to subproblems (Pi) and (P^)- 
Lemma 6. A Gaussian (U, V) with the conditional covariance matrix 



(54) 



K 



X|U,V 



Kx Bi B2 



D 



is optimal for the subproblem (Pi), and the optimal value is 



«(A) = fiogM 



Proof. See Appendix F. 



(55) 

□ 
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Lemma 7. A Gaussian V with the conditional covariance matrix 

is optimal for the subproblem {P2), and the optimal value is 

v{p2)^^ log , l^^l , - ^ log , l^^l , . (56) 
Proof. See Appendix G. □ 



Substituting ( 55 1 and ( 56 1 into ( 54 1, we obtain 



|Kx-B^| 1 Ik 

viP) > ^ log ' ^ . , ^' + - log ' 



2 |D| 2 °|Ky-B*| 

n iKx-B^I 1 IKyI 

= log ' ^' + - log , — ' , (57) 

2 |D| 2 ^|Ky-B*| 

= ^^(^02), (58) 

where 

(|57]) follows from ([38]), and (|47]|, and 

(|58]) follows from ([26]). 
We conclude from (|52]), and (|58]l that 

V{P)>V (Pg2) . 

It now follows from this and ( |23] l that 

?;(P) =z;(Pgi) = f (Pg2), (59) 
which proves that a Gaussian (U, V) is optimal for the main optimization problem (P). This completes the proof of Theorem 

m 

7 Converse Proof of Theorem [1] 

Liu and Viswanath gave a single-letter outer bound to the rate region in H. We shall use a similar outer bound that is 
reminiscent of the Berger-Tung outer bound ll9l [T0ll . 

Lemma 8. If the rate-distortion vector (Pi, R2,'D) is achievable, then there exist random vectors U and V such that 

Pi >/(X;U|V), 
i?2 >/(Y;V), 

D ^ Kx|u,V7 and 

X o Y o V. 

The proof of the lemma is similar to fV, Lemma 2] and is omitted. We are now ready to prove the converse of Theorem[T| 
If (Pi, P2, D) is achievable, then 

^iRl +R2>v (P) (60) 

^ f viPpt^pt) ifO<A^ < 1 ^^j^ 
\ v{Pgi) if^l>l 

= 7^*(D,^i), (62) 

where 
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( |60l ) follows from Lemma [8] and 
(|6T]) follows from Q and 



And if (i?i, i?2, D) G 7^1?, then (62 1 again holds because TZ* (D, ^) is continuous in D. So, (62 1 is a lower bound for any 
(i?i, i?2) in the rate region TZiT)). Hence, 

7^(D, u) = inf aRi + i?2 

(fii,i?.2)e-R,(D) 

>7^*(D,M). 

This completes the converse proof of Theorem[T] 

Remark 3: It follows from Theorem[T]and Lemma[T]that one can add the constraints 

U o X Y o V and 

(U, V, X, Y) are jointly Gaussian 



to the optimization problem 



(P) ^ min m/(X;U|V)+/(Y;V) 

subject to Kx|u.v D and 



without changing its optimal value. 



8 Solution for the General Case 

In this section, we lift the assumptions on Kx, Ky, and D and allow them to be any positive semidefinite matrices. We shall 
show that the Gaussian achievable scheme is optimal for this general problem. For this section, we denote the rate region of 
the problem by TZ (Kx, Ky, D)- Note that Kx and Ky completely specify the joint distribution of X and Y because we 
continue to assume that X = Y + N. Similarly, TZc (Kx, Ky, D) is used to denote the rate region achieved by the Gaussian 
achievable scheme. We use TZ (Kx, Ky, D, m) ^"d TZq (Kx, Ky, D, /i) to denote the two minimum weighted sum-rates. 



Likewise, we denote the set S defined in Section 3.2 by 5(Kx, Ky, D). We use similar notation later in the section. We start 
with the following extension. 

Theorem 4. //Kx and D are positive definite, and Ky is positive semidefinite, then 

n (Kx, Ky, D, fi) = TZg (Kx, Ky, D, ^) . 

Proof. It suffices to prove that 

7^ (Kx , K Y , D , A* ) > 7^G (Kx , K Y , D , ) . 

If Ky is positive definite (hence nonsingular), then the result follows from Theorem [T] We therefore assume that Ky is 
singular and has a rank p < m. The eigen decomposition of Ky is 

Ky = QSQ^, 
S = Diag(ai, . . . , dp, 0, . . . , 0). 

Q = [Qi,Q2], 



where Q is an orthogonal matrix and 
Let us partition Q as 
where Qi is an m x p matrix. Let us define 



Q^KnQ = 



E F"^ 
F G 



21 



where E, F, and G are submatrices of dimensions pxp, {m—p) xp, and {m—p) x (m—p), respectively. Since Q2 KyQ2 = 
and X = Y + N, we have that 

G = Q^KnQ2 = Q2 KxQ2 ^ 0, 
i.e., G is positive definite. Using this, we define 

V Im-p J 

A defines a transformed problem in which the transformed sources are 

X ^ AX and 
Y = AY, 

which satisfy 

X = Y + N, 

where N = AN, and the transformed distortion matrix is 

D = ADA^. 

The covariance matrix of the transformed sovirce Y is 

' El o\ 



Ky = AKyA^ = S = 
where 







Si = Diag(ai, . . . ,ap), 

and the covariance matrix of N is 

Kn = AKnA^ 

\ )\F G )\ -G-^F J 

_ / E-F^G-^F \ 

"I g)- 

Using these, the covariance matrix of the transformed source X can be expressed as 

Kx = Ky + Kn 

_ / Si + E- F'^G-^F \ 

"V g)- 

Since A is invertible, the above transformation is information lossless, and hence the transformed problem is equivalent to the 
original problem. Therefore, 

7e(Kx,KY,D,;(i) =7e(Kx,KY,D,^) and 
TZg (Kx, Ky, D, h) = TZa (Kx, Ky, D, /x) . 

So, it is sufficient to prove that 

n (Kx, Ky, D, > Tie (Kx, Ky, D, /i) . 
Let us define the following matrices 

""n^"* = ( IG ) ""'^ 



E-F'^G-^F \ 

(i-^)gJ' 
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where n is a positive integer. It is clear that these matrices are positive semidefinite and they satisfy 

~ (n) ~ (n) 

Let Nj^ and Nn be zero-mean vector Gaussian sources with covariance matrices Kpj(„) and K^(„), respectively. In 
addition, suppose they are independent of each other and all other vector Gaussian sources. We can then write 

X = Y + N<") + N^"\ 

Let us consider a new problem in which encoder 1 has access to X, encoder 2 has access to ^Y, N^"-*^ , and the distortion 
constraint on X is D. This problem is clearly a relaxation to the original problem because encoder 2 has access to more 
information about X than the original problem. In other words, any feasible scheme for the original problem is also feasible 
for this new problem. Now since there is no distortion constraint on Y and the sufficient statistic of X in [y.Nj"'' ) is 



Y + nJ""*, this new problem is equivalent to the problem in which encoder 2, instead of ^Y, N^"''^ , has access to the sum 

Y + N^"-*. Let us denote this sum by Y^"), i.e., 

Y(") A y + n("\ 

which has a positive definite covariance matrix 



Ky(.) = Ky + K^(„) - I 1 



Si 
hG 



It follows that 

n (Kx, Ky(.) , D, ^) < 7^ (Kx, Ky, D, ^i) . 
Since this is true for all n and TZ (Kx, Kyc^) , D, /i) is nondecreasing in n, we obtain 

lim n (Kx, Ky(.) , D, /i) < 7^ (Kx, Ky, D, ^) . (63) 

Since Kx, Kyc^) , and D are positive definite, the conclusion of Theorem[l]holds for this sequence of relaxed problems, i.e., 
for each 71 

7^ (Kx, Ky(.) ,'D,^i)= TZg (Kx, Ky(„) , D, ^) . 



This and ( 63 1 together imply that 

lim 7^G (Kx, Ky(™) , D, /i) < 7e (Kx, Ky, D, fi) . (64) 

n— ^oo 

Now for each n, there exists (U^") , V(")) in S (Kx, KyC) , D) such that 

7^G(Kx,KY(..,,D,A^) =M/(X;U(")|V(")) +/(y(");V(")) . (65) 

imce X, Y(") , U(") , and V^"' are joindy Gaussian, we can without loss of generality parameterize them by positive semidefi- 
nite matrices Bi and B2 as in the definition {Pg2)- These matrices lie in a compact set because they satisfy the KKT conditions 
that are continuous, and they are bounded as Bi + B2 ^ Kx. Therefore, there exists a subsequence of KyC") along which 
(U(") , V("' ) converges to (U, V) in 5 (Kx , Ky , D) . Since the right-hand-side of (165} is continuous in ( Y^"' , U*") , V(") ) , 
this implies 

lim TZg (Kx, Ky(.) , D, ^i) = fil (X; U|V) + / (Y; V) 

>7^G(Kx,KY,D,/i). (66) 

It now follows from (|64]i and (|66| that 

n (Kx, Ky, D, fi) > TZg (Kx, Ky, D, . 
This proves Theorem |4] □ 
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We next use Theorem|4]to prove our result for the most general case of the problem. 
Theorem 5. For any positive semidefinite Kx, Ky, and D, we have 

7^(Kx,KY,D,A^) =7^G (Kx, Ky, D, ^i) . 
Proof. Let us suppose that the rank of Kx is p < m. Since Kx is positive semidefinite, its eigen decomposition is 

Kx = QSQ^, 

where Q is an orthogonal matrix and 

S = Diag(ai, . . . , ap, 0, . . . , 0). 

Let us partition Q as 

Q= [Qi,Q2], 

where Qi is an m x p matrix. Since Q^KxQ2 = and X = Y + N, we have 

Q^KyQ2 = QrKNQ2 = 0, 



which implies that 



Q-KvQ^ ™' M and 



Q^KnQ = 



Q^^KnQi 




Let us define 



where E, F, and G are submatrices of dimensions p x p, {m ~ p) x p, and [m ~ p) x (m — p), respectively. We need the 
following lemma. 

Lemma 9. ^2Ql Appendix A. 5. 5, p. 651 ] Q^DQ ^ if and only if 

G ^ 0, 
E - F'^G+F )p 0, and 
- GG+)F = 0, 



where G^ is the pseudo-inverse or Moore-Penrose inverse ofG l\20\ Appendix A. 5. 4, p. 649]. 
Let 

\T2 J \ I„,_p J ^ 
where Ti is ap x m matrix. Using this, we obtain a transformed problem in which the transformed sources are 

^']^( ]^TX and 





\ A / TiX 


X. , 


/ V T2X 


Y, ^ 


U / T,Y 


Y2 , 


/ V T2Y 



Y^ . ^TY. 
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Using Lemma |9] we obtain the transformed distortion matrix 

D = TDT^ 



where 







n 
U 


T 

^m—p 











E 


F^G+F 







Di 


' ] 









Di 




D2 



j Q^DQ \^ 
\ / E F^ 

G 



G+F I 



G+F I„ 



The covariance matrix of the transformed source X is 



Kx = TKxT^ 





Ip 




-F^G^ 
_F^G^ 



Q^KxQ I 

/ Si 
I 



G+F I 



rn—p 



G+F I 



m—p 



El 




where 



Si =Diag(Q;i,...,ap), 
and the covariance matrix of the transformed source Y is 
Ky = TKyT^ 







I 



F^G^ 

Iru— p 

F^G^ 



Q^KyQ 



G+F I 



Q^KyQi 





m—p 

I 











G+F I, 



^ / Qf KyQi \ 
It follows that X2 and Y2 are deterministic, i.e., 

X2 - Y2 = 0. 

Since T is invertible, the distortion constraint is equivalent to 

1 " r 

TDT^ - ^E (X"(i) - X"(i)) (X"(i) - X"(z) 







x?w-x?w 





X?(z)-X?(z) X?W-X?(z 
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Since Di and D2 are positive semidefinite from Lemma|9] ( [67| and (68 1 imply that the distortion constraint is equivalent to 



1 " 

71 



Xr(*)-Xr« X«(z)-X«(z 



Since T is invertible, the above transformation is information lossless, and hence the transformed problem is equivalent to 
the original problem. Moreover, the transformed problem is effectively p-dimensional with the sources Xi and Yi, and the 
distortion matrix Di such that 

Kxi = Si ^ and 
Xi = Yi+Ni, 



where Ni = TiN. We therefore have that 



7^(Kx,KY,D,/x) =7e(Kxi,KYi,Di,/i) and 
7^G (Kx,KY,D,/i) =7^G (Kx,,Ky,,Di,^). 



(69) 
(70) 



Since Kxi is positive definite, if D 1 is singular, then the right-hand side of ( 69 1 and ( 70 1 are both infinite, so the conclusion 
trivially holds. Otherwise, we have that Kx^ and Di are positive definite and Ky^ is positive semidefinite. In that case 
Theorem]?] implies that 

7^ (Kxi , Kyi , Di , = 7^G (Kxi , Ky^ , Di , ^) . 



This together with ( 69 1 and ( 70 1 establishes the desired equaUty 

7^(Kx,KY,D,A^) -7^G (Kx, Ky, D, ^i) . 
Theorem]5]is thus proved. □ 

Appendix A: Proof of Equality in Lemma [1] 

Suppose fi is in [0, 1]. Then for any (U, V) in S, we have 

HI{X; U| V) + /(Y; V) = /i/(X; U, V) - /i/(X, V) + /(Y; V) 

= fiI{X; U) + /i/(X; V|U) + //[/(Y, V) - /(X; V)] + (1 - /i)/(Y; V) 
>M/(X;U) (71) 



IK 



where ( [tT] ) follows because of the facts that 



and 



and we have 



X|U| 

/(Y;V) > 
J(X;V|U) >0, 
/(Y,V)-/(X;V)>0 



because of the data processing inequality 1211 Theorem 2.8.1] and the Markov chain X o Y o V. The inequality ( ]7T| is 
achieved by any (U, V) in S such that V is independent of (X, Y, U), and the conditional covariance of X given (U, V) 
satisfies 

=^ Kx|u,v = Kx|u =^ D. 
Since conditioning reduces covariance in a positive semidefinite sense, we have an additional constraint 



Kx|u ^ Kx. 
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We therefore have the following 



7?.g(D, u) = min ui?i + R2 

(fli,fl2)eKG(D) 

= min /x/(X;U|V) +/(Y; V) 
(u,v)e5 

A', |Kx| 

— mm — log 



Kx|u 2 |Kx|u| 

subject to Kx ^ Kx|u ^ and 
D ^ Kx|u 

= W (Ppt-pt) ■ 

Suppose now that ^ > 1. Then any (U, V) in S can be characterized by positive semidefinite conditional covariance matrices 
Ky|v ™d Kx|u,v such that 

Ky ^ Ky|v ^ 0, 

Ky|v + Kn ^ Kxiu.v ^ 0, 

D ^ Kx|u,V: 

and 

/(X;U|V) = ilogl''-l- + '^~l 



|Kx|u,vl 



/(Y;V) = ;^log '^^ 



In this case, we have 



KyivI 



7^g(D,u)= min iiRi + R2 

(fl.i,fl2)eKG(D) 

= min u/(X:U|V) +/(Y; V) 
(u,v)e5 



^ |Ky|v+Kn| , 1, |Ky| 

mm — log - — 1 — 1 — - + - log ■ 



Ky|V:Kx|u,v 2 |-Kx|U,v| 2 |Ky|v| 

subject to Ky ^= Ky|v ^= 0, 

Ky|v + Kn ^= Kx|u,v ^ 0, and 
D ^ Kx|u,v 



Appendix B : Proof of Lemma |2] 



We will be using several results and terms from Bertsekas et al. ||22II . The book contains all of the background that these 
results need. The proof of the lemma is partially similar to that of Lemma 5 in fS^. Let us first introduce some notation 
used in the proof. We use vec(Ai, A2) to denote the column vector created by the concatenation of the columns of m x to 
matrices Ai and A2. If a = vec(Ai, A2), then we use the notation mat(a) to denote the inverse operation to get back the 
pair (Al, A2), i.e., 

mat(a) = (Ai, A2). 

The set of all column vectors created by the concatenation of the columns of to x to symmetric matrices Ai and A2 is denoted 
by A, i.e., 

{vec(Ai,A2) : A, = AJ for all i e {1,2}}. 

n{B) is used to denote the relative interior of the set B. The sum of the two vector sets Vi and Vi is denoted by Vi + V2 and 
is defined as 

Vi + V2 = {vi + V2 : V, e V, for all i £ {1, 2}}. 
We also need the following facts from linear algebra. 
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Lemma 10. (a) IfE is an m x n matrix and Y is an n x m matrix, then Tr(EF) = Tr(FE). 

(b) IfE and F are positive semidefinite, then EF — if and only //'Tr(EF) — 0. 

Proof. Part (a) immediately follows from the definition of Tr( ) function. Part (b) can be proved using the eigen decomposi- 
tions of E and F. □ 

We can express the problem {Pg2) as 

min h(h) 

b 

subject to b G /B, 

where b = vec(Bi,B2), 

.M, |Kx~B2| , 1, |Ky| 
h{h) = - log — — + - log ■ 



2 °|Kx-Bi-B2| 2 °|Ky-B2|' 
and the feasible set B is written as 

6^ Bin 620612, 

where for i E {1,2} 

B, ^ {vec(Bi,B2) : B, ^ 0}n^ 

and 

612 = {vec(Bi, B2) : Bi + B2 ^ Kx - D} n ^. 
Since /i(-) is continuously differentiable, it follows from t22l Proposition 4.7. 1, p. 255] that any local minima b* must satisfy 

-Vh(h*) eTsih*)*, (72) 

where V/i(b*) is the gradient of h{-) at b*, and Te(b*)* is the polar cone of the tangent cone Te(b*) of B at b*. Now since 
Bi for all i e {1,2} and B12 are nonempty convex sets andri(6i) nri(62) nri(6i2) is nonempty, it follows from 1221 Problem 
4.23, p. 267] and 122] Proposition 4.6.3, p. 254] that 

Teih*)* =TB,ih*r +TB,{h*)* +Tb,A^*)*. (73) 

We next show that 

-V/i(b*) e TB,{h*y nA + TB,{h*y nA + TB,,{h*y nA. (74) 



Note that — V/i(b*) is a column concatenation of two m x m symmetric matrices. This together with (72 1 and (73 1 yields 

-V/i(b*) = zi +Z2 + Z12 e A (75) 



where for i £ {1,2} 



Let us now define 



Using this, we define 



e 7e,(b*)* and 
Z12 e rei2(b*)*. 



(K„L0 = mat(z,),ViG{l,2} and 
(Ki2,Li2) = mat(zi2). 



^vec( i(K,+Kf),^(L,+Lf) ) ,V*e{l,2} and 



zi2^vec(i(Ki2 + Kf2),^(Li2 + Lf2: 
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Since Bi is a nonempty convex set, it follows from 11221 Proposition 4.6.3, p. 254] that 

zf(b-b*)<0, Vbe^Bi. 

Consider any b e Si . Let 



We now obtain 



(Ei,Fi) ^ mat(b- b*). 



zf (b - b*) = ^Tr ((Ki + Kf ) El) + ^Tr ((Li + Lf ) Fi) 



-Tr(KiEi)+Tr(LiFi) 

= zf(b-b*) 

<0, 



where 



fT7\ follows because Ei and Fi are symmetric, and 
(|78]) follows from (|76]). 



By definition, zi G A. This and (78 i imply that 



We can similarly show that 



Z2 e Te^ih*)* nA and 
zi2eTBi.(b*)*nA 



(76) 



(77) 
(78) 



(79) 



(80) 
(81) 



Now 



Zi + Z2 + Zi2 

= vec Q (Ki + K2 + Ki2 + + + K^^) , ^ (Li + L2 + L12 + Lf + + L^^] 
= vec ((Ki + K2 + K12) , (Li + L2 + L12)) 

= Zi + Z2 + Z12 

= -Vh{h*), 



where 



82|) follows because Ki + K2 + K12 and Li + L2 + L12 are symmetric from (75 1, and 



831) follows from the equality in (75 1. 



This together with ( 79 1 - ( 8 1 1 implies ( 74 1 



(82) 
(83) 



We now proceed to characterize the right-hand side of (74 1. Consider any z e Tgj (b*)* n A. It again follows from ll22l 
Proposition 4.6.3, p. 254] that 



Let us define 



z^(b-b*)<0, VbeSi. 



(Mi,M2)^mat(z), 
(Bi,B2) = mat(b), and 
(Bt,B*)^mat(b*). 



(84) 
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Then ((841) can be re-written as 



^Tr(M,(B,-B*))<0, Vvec(Bi, B2) e 61. (85) 

1=1 

We first show that M2 = 0. Let us pick (61,62) = (Bl^Bi, + M2). This means that 

Tr(M2M2) < 0, 

which imphes that M2 = because M2 is symmetric. We next prove that Mi is negative semidefinite. Suppose there exists 
w 7^ such that w^Miw > 0. We then have 

< w^Miw = Tr(w'^Miw) = Tr(Miww^), 



where the last equahty follows from Lemma 10 a). But this contradicts (85 1 because vec(B^ + ww , B2) G i5i, and hence 
Ml =^ 0. We finaUy show that MiB^ = 0. Let (61,62) = (aB^ , 6^), where a > 1. Then ^ implies that 

Tr(MiBi) < 0. 

Likewise, on picking < a < 1, we obtain 

Tr(MiB^) > 0. 

Both together establish 

Tr(MiB^)-0, 

which together with Lemma [TO|b) implies that 

MiB^ = 

because —Mi and 6| are positive semidefinite. We therefore have that 

TB,{h*)* nAC {vec(Mi,0) : Mi =<; and Mie* = 0}. (86) 

Similarly, we can show that 

Te^ih*)* nAC {vec(0,M2) : M2 ^ and Mae^ = 0}. (87) 

Consider any z S T^^^ (b*)* n A. As before, we obtain 

2 

^Tr(A,(6,-6*))<0, Vvec(6i, 62) G 612, (88) 

1=1 

where 

(Ai,A2) =mat(z). 



On picking (61,62) = (6* + Ai,6* - Ai), (jM]! yields 

Tr(AiAi)-Tr(A2Ai) <0. 
Similarly, picking (61, 62) = (61 - A2, 63 + A2) gives 

Tr(A2A2) - Tr(AiA2) < 0. 

Both together imply that 

Tr((Ai-A2)(Ai-A2))<0, 

and therefore 

Ai-A2 = 0, 

because Ai and A2 are symmetric. Let us denote Ai and A2 by A. As before, we can show that A =^ 0. We next prove that 

Tr(A(6* + 6^ - Kx + D)) = 0. 
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Observe that (61,62) =(a(B;|: + - Kx + D) + Kx - D - BJ, BJ), where a > 0, is a valid choice of (Bi,B2) in 
(88 I. For a > 1, this impHes 

Tr(A(B* + B; - Kx + D)) < 0, 



and for < a < 1, it gives 



Therefore 



This and Lemma 10 h) imply that 



Tr(A(Bt + B; - Kx + D)) > 0. 
Tr(A(Bt+B*-Kx + D)) = 0. 
A(B* + B; Kx + D) = 0. 



We thus have that 



Tb,, {h*y nAC {vec(A, A)|A =^ and A(B* + B^ - Kx + D) = 0}. 



It now follows from ( 74 1, ( 86 1, ( 87 1, and ( 89 1 that V/i(b* ) can be written as 

V/i(b*) = vec (Ml + A, M2 + A) 

for some Mi, M2, and A such that 

M,B* = 0, for all i G {1,2} 
A(B^ + B; Kx + D) = 0, and 
Mi,M2,A ^ 0. 



Lemma 12] now follows because 



Vh{h*) = vec ( ^:(Kx - BI - B;)-\ ^(Kx ~ B*i - B^)-^ - ^^(Kx - B^)-^ + ;^(Ky - B^) 



-1 



1 



Appendix C : Proof of Lemma |3] 



Using ( 12 1, we obtain 



a* = |(Kx-b;)-i-mi 

= (Kx - B;)-' [(J(Kx - B*) - (Kx - B;)M*i(Kx - B^)! (Kx - B^) 



It is hence sufficient to show that 



(Kx - B;) - (Kx - B;)Mt(Kx - B^) 



is positive semidefinite. On pre- and post-multiplying (|7]i by Kx — Bjf — Bj, we obtain 

^(Kx - B* - B;) - (Kx - B^ - B*)(Mt + A*)(Kx - B* - B^) = 0. 



Using (j9]) and ( 10 1, we have 

(Kx - B* - B*)Mt(Kx - B* - B^) = (Kx - B;)Mt(Kx - B^) and 
(Kx - B^ B2)A*(Kx B^ - B^) = DA*D. 



Now ( 90 1 through ( 92 1 together imply that 



^(Kx - B;) - (Kx - B;)Mt(Kx - B^) = ^B^ + DA*D, 



which is a positive semidefinite matrix. 

We next show that A* is nonzero. Suppose otherwise that 



(89) 



(90) 



(91) 
(92) 



A* = 0. 
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This together with ( 12 1 implies that 



M* = |(Kx - B*)-i y and 
M; = i(KY-B*)-V0, 
i.e., and are positive definite. It now follows from (|9]l that 

bi = b; = 0, 

which is a contradiction because (0, 0) is not feasible for the optimization problem (Pg2) by ([T]l. 

Appendix D: Proof of Lemma |4] 



It is clear by definition that B^;, B2, M|, and are positive semidefinite matrices. To prove (36 1, we use the first equality 
in ( [T2] ) and obtain 



SS"^ = A* 



= ^(Kx-B;)-1-M* 



[S, T] ([S, T]^ (Kx - B;) [S, T]) [S, T]^ - Ml 



Mrs Tl ( ^ (Kx-B5)S 

2^ ' ^ T^(Kx-B^)T 

/i / fS^(Kx-B;)S)"' 



[S,T] 







(T^ (Kx-B^)T 



[S,T]'^-M* 

) [S,T]^-M* 



(93) 
(94) 



^s(s^(Kx-b;)s) 's^ + ^t(t^(Kx-b;)t) 't^-m^, 



(95) 



where 

( |93| ) follows because [S, T] is invertible, and 

(|94l) follows because S and T are cross (Kx — B2)-orthogonal. 



On pre- and post-multiplying (95 1 by (Kx — B2) and (Kx — B2) S, respectively, and again using the fact that S and T 
are cross (Kx — B2)-orthogonal, we obtain 

(S^ (Kx - B;) S) (S^ (Kx - B^) S) = ^ (S^ (Kx - B^) S) - (Kx - B^) M* (Kx - B^) S, 

which is equivalent to 

Ir = ^ (S^ (Kx - BJ) S)"' - (S^ (Kx - BJ) S)"' (Kx - B^) MJ (Kx - B^) S (S^ (Kx - BJ) S)"' . (96) 

Similarly, using the second equality in ( [T2| together with the facts that [S, W] is invertible and S and W are cross (Ky — Bj)- 
orthogonal, we obtain 

Ir = ^ (S^ (Ky - B*) S)"^ - (S^ (Ky - B*) S)~^ (Ky - B*) M; (Ky - B*) S (S^ (Ky - B^) S)"^ . (97) 



Now (96 1 and (97 1 together can be written as 



I. = f (Kx - b;)-' - Ml = 1{k^-b;)-'- m; 



(98) 



This proves ( 36 1 



To prove ( 37 1, we have from (j9]l and ( 14 1 that 

B^a, = 0, 

for all i in {1, 2, . . . ,p}. Since the columns of T are in span{a.i}^^^, we have 

BJT = 0. 
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This and (j9]) together imply 
We now use ( |95] l and obtain 
which can be re- written as 



B* (MJ - ^T(T^(Kx-B2)T) ^ T"^) = 0. 



B 



I (|S (S^ (Kx - B;) S)"' - SS^) = 0, 



Using the first equahty in ( 98 1 yields 
We next invoke Lemma [TO|b) to obtain 



Using Lemma 10 ^a) gives 
which is equivalent to 



b*s(|(Kx-b;) ^-I,)S^ = 0. 



BjSMiS^ = 0. 
Tr(B*SM*S^) = 0. 
Tr(S^BJSM^) = 0, 



Tr(BJM^) = 0. 

Since B^ and are positive semidefinite, by invoking Lemma [lo|b) again, we obtain 

B^M^ = 0. 

The proof of 

b;m; = 0. 



is exactly similar This proves ([37]l. The proof of ( 38 1 is immediate from (16 1 



Appendix E : Proof of Lemma |5] 

The proofs of (|4T) and (|42]l are easy. They follow from (|36]l, (|39]), and (|40]l. Since > 1, (|4T]l and (|42]) imply that 



K-j^^ and are positive definite by definition. Since and are positive semidefinite. 



K.j^ ^ and 



follow from (39 1 and (40i, respectively. This proves (|43]l and (44i. To prove (45 i, we have 



|Ky-B*| 



|Ky-B^ + B^| 
|Ky-B*| 

|i, + b^(Ky-b^)" 



|I,,+B*[(Ky-B*) '-2M*] 



1 1,. + B2(Ky - B2; 



(99) 
(100) 



|Ky-B*| 



where 

(|99]) follows from ([37]|, and 
(fTOOl) follows from (|40ll. 
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To prove ( 46 1, we proceed similarly and obtain 

IK^ m 



|l,-B*(Kx-B*) 

|I.| 

|l,-B*[(Kx-B* 

|lr-Bl(Kx-B*) 

|Kx-b;| 
|Kx - b^; - b^I 



(101) 
(102) 



where 

( [ToT] ) follows from ([37]), and 
(|T02| follows from (1391). 



Appendix F: Proof of Lemma [6] 

We have 



h{±\V,V) < -log((2^e)'^|Kx|u,vl 
< ^log((2^e)'^|D|), 



(103) 
(104) 



where 



( |103| ) follows from the fact the Gaussian distribution maximizes the differential entropy for a given covariance matrix ETl 
Theorem 8.6.5], and 

( |104[ ) follows from the distortion constraint in the definition of (Pi ) and the concavity of log | • | function. 

Inequalities ( 103 i and ( 104 1 are equalities if X, U, and V are jointly Gaussian with the conditional covariance matrix K^^^i^ 
such that 



K 



X|U,V 



T) = K^-Bl Bl, 



(105) 



where the last equality follows from (47 i. We thus conclude that a Gaussian (U, V) with the conditional covariance matrix 



satisfying (105 1 is optimal for the subproblem (Pi), and the optimal value is 



t;(Pi)=Ai/i(X)-|log((2^e)''|D|) 

^|log((2.e)'-|K^|)-flog((2.e)nD|) 



Appendix G: Proof of Lemma |7] 

Since conditioned on V, Y and N are independent, we use the vector EPI ET\ Theorem 17.7.3] to obtain 



h{Y\V) ~ ^lhi±\V) = h{Y\Y) - nh{Y + N|V) 

fir 



</j(Y|V)-^log(2^'^(Y|V)_^2^/. 



(N)^ 



(106) 



The inequality ( 106 1 is equality if Y and V are jointly Gaussian and the conditioned covariance matrix 
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for some constant a > 0. By following standard calculus arguments, we can show that for /i > 1 the right-hand side of ( 106 1 
is concave in /i(Y|V) and has a global maximum at 

MY|V) = MN)-^log(M-l). (107) 
Let Vg and Y be jointly Gaussian such that the conditional covariance matrix of Y given Vq is 



We next show that this achieves equahty in ( 106 1 and satisfies ( 107 1 simultaneously. We have from (41 1 and (42 1 that 



Ky-B; = (/i-l)-^Kj;j, (108) 



i.e., the conditional covariance matrix Kyiv proportional to Kj^j. Hence, ( 106 1 is satisfied with equality. Moreover, for 



this Yq, ( 107 1 and ( 108 1 are equivalent. Therefore, 



MY|V) - ^ihiX\V) < ^ log( (27re)'-|KY - B;|) - ^ log( i27TeY\K^ -B;\). 

We thus conclude that Vq is optimal for (P2) and the optimal value is 

v{p2) = fih{X) - h{Y) + i log( (2^6)'^ |Ky - B;|) - I log( (2^e)^|K^ - B;|) 

= I log( (27re)'-|K^|) - ^ log( {2neY\K^\) + \ log( (27re)'-|KY - B;|) - | log( (2^e)'-|K^ - B* 
M..„ |Kx| |Ky| 



= — log — r log , - , . 

2 n^x-B^I 2 ^|Ky-B*| 
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